Skip to content

[18.0-fr3] Add node failure tolerations to all service operators and openstackclient#1549

Merged
openshift-merge-bot[bot] merged 2 commits into
openstack-k8s-operators:18.0-fr3from
openshift-cherrypick-robot:cherry-pick-1545-to-18.0-fr3
Jul 31, 2025
Merged

[18.0-fr3] Add node failure tolerations to all service operators and openstackclient#1549
openshift-merge-bot[bot] merged 2 commits into
openstack-k8s-operators:18.0-fr3from
openshift-cherrypick-robot:cherry-pick-1545-to-18.0-fr3

Conversation

@openshift-cherrypick-robot

@openshift-cherrypick-robot openshift-cherrypick-robot commented Jul 30, 2025

Copy link
Copy Markdown

This is an automated cherry-pick of #1545

/assign stuggi

Jira: OSPRH-18450

stuggi added 2 commits July 30, 2025 14:52
These changes ensure OpenStackClient pods are automatically rescheduled
when nodes fail, instead of requiring manual intervention to delete
stuck pods. The 120-second tolerations provide faster failover compared
to the 5min default, while the stuck pod detection handles edge cases
where normal eviction fails.

- Adds tolerations for faster pod eviction (120s vs 5min default)
  * Handle node.kubernetes.io/not-ready taints
  * Handle node.kubernetes.io/unreachable taints
- Force delete stuck pods with grace period 0

Note:
- going lower then 120s could be too aggressive and result in pod
eviction e.g. during a network issue, or kubelet restarts
- in a follow up same tolerations should be added to the operator
controller manager deployments, since the
openstack-operator-controller-manager is the one handling the
openstackclient pod.

Jira: OSPRH-18450

Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
This change adds 120s tolerations for node.kubernetes.io/not-ready
and unreachable taints to reduce pod failover during a node failure.

The total eviction time is  ~160s (5min+ default). 120s was choosen
to prevents pod rescheduling e.g. on kubelet restarts or network issues

Jira: OSPRH-18450

Signed-off-by: Martin Schuppert <mschuppert@redhat.com>

@abays abays left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci

openshift-ci Bot commented Jul 30, 2025

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abays, openshift-cherrypick-robot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stuggi

stuggi commented Jul 31, 2025

Copy link
Copy Markdown
Contributor

/retest

1 similar comment
@stuggi

stuggi commented Jul 31, 2025

Copy link
Copy Markdown
Contributor

/retest

@openshift-merge-bot openshift-merge-bot Bot merged commit 9b4401f into openstack-k8s-operators:18.0-fr3 Jul 31, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants